{ "cells": [ { "cell_type": "markdown", "id": "c28a32e9-e5e7-4f17-8384-bd9e752b340e", "metadata": {}, "source": [ "# DMPs and DMRs" ] }, { "cell_type": "code", "execution_count": null, "id": "58882ee2-6b75-4196-8bbc-b64ae36eee04", "metadata": {}, "outputs": [], "source": [ "from pylluminator.samples import Samples\n", "from pylluminator.visualizations import dmr_manhattan_plot, dmp_heatmap, visualize_gene, show_chromosome_legend\n", "from pylluminator.dm import DM\n", "from pylluminator.utils import save_object\n", "\n", "from pylluminator.utils import set_logger\n", "\n", "set_logger('WARNING') # set the verbosity level, can be DEBUG, INFO, WARNING, ERROR" ] }, { "cell_type": "markdown", "id": "71bcd367-d86f-4f20-b19a-7f8d75e9d39f", "metadata": {}, "source": [ "## Load pylluminator Samples\n", "\n", "We assume that you have already processed the .idat files according to your preferences and saved them. If not, please refer to notebook `1 - Read data and get beta values` before going any further." ] }, { "cell_type": "code", "execution_count": null, "id": "ae0bd9d9-752f-40d7-b93c-0f41f709ba3d", "metadata": {}, "outputs": [], "source": [ "my_samples = Samples.load('preprocessed_samples')" ] }, { "cell_type": "markdown", "id": "c511cfe4-f639-4925-9e15-916633d95b1d", "metadata": {}, "source": [ "Here, we want to filter out the probes on the X or Y chromosomes." ] }, { "cell_type": "code", "execution_count": null, "id": "9000c4d1-6154-4908-b851-8a7f0f91e0a5", "metadata": {}, "outputs": [], "source": [ "my_samples.mask_xy_probes()" ] }, { "cell_type": "markdown", "id": "a19fb496-6ff3-403a-b3c4-9be90d7af6e7", "metadata": {}, "source": [ "To speed up the demo, we will only calculate DMPs and DMRs on 10% of the probes" ] }, { "cell_type": "code", "execution_count": null, "id": "0c2b4526-52a5-45b8-af6e-f1293b66eafb", "metadata": {}, "outputs": [], "source": [ "ten_pct_probes = int(0.1 * my_samples.nb_probes)\n", "probe_ids = my_samples.probe_ids[:ten_pct_probes]\n", "print(f'Selected {ten_pct_probes:,} first probes')" ] }, { "cell_type": "markdown", "id": "433f4b97-cb16-47f9-b4af-215d2d7800dc", "metadata": {}, "source": [ "## Differentially Methylated Probes\n", "\n", "The second parameter needed to create a DM object (here `~ sample_type`) is a R-like formula that describes the model, and is used to create the design matrix. You can use one or more predictors in the formula, e.g. `~age + sex`. The predictors names must be the column names of the sample sheet.\n", "\n", "More info on design matrices and formulas:\n", "- https://www.statsmodels.org/devel/gettingstarted.html\n", "- https://patsy.readthedocs.io/en/latest/overview.html\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c11c9f7d-a965-4780-8063-1b21b5150b62", "metadata": {}, "outputs": [], "source": [ "my_samples.sample_sheet" ] }, { "cell_type": "code", "execution_count": null, "id": "08e98ff2-499b-4194-a702-5be5b79c9bf5", "metadata": {}, "outputs": [], "source": [ "my_dms = DM(my_samples, '~ sample_type', probe_ids=probe_ids)" ] }, { "cell_type": "markdown", "id": "43e4c278-f7f6-4d8e-a27e-83fafdc3eb57", "metadata": {}, "source": [ "You can now plot the results, for the 25 most variable probes:" ] }, { "cell_type": "code", "execution_count": null, "id": "13381265-f0d1-4b9a-b964-c78d7bd6b21c", "metadata": {}, "outputs": [], "source": [ "dmp_heatmap(my_dms, my_dms.contrasts[0], nb_probes=25, figsize=(8, 5))" ] }, { "cell_type": "markdown", "id": "e48af4ef-e130-462f-b758-8758fa758d73", "metadata": {}, "source": [ "## Differentially Methylated Regions" ] }, { "cell_type": "markdown", "id": "c8450c59", "metadata": {}, "source": [ "We can then identify ths DMRs by grouping neighboring probes with similar methylation patterns for a given predictor contrast. Similarity is calculated based on the Euclidean distance between probes’ beta values." ] }, { "cell_type": "code", "execution_count": null, "id": "070e95af-e399-44c7-996d-406a6ea55085", "metadata": {}, "outputs": [], "source": [ "my_dms.compute_dmr(my_dms.contrasts)\n", "save_object(my_dms, 'dms')" ] }, { "cell_type": "code", "execution_count": null, "id": "24b861a6-814f-452c-91f1-67a50ba30490", "metadata": {}, "outputs": [], "source": [ "# get top DMRs and their associated genes for the first contrast, PREC\n", "my_dms.get_top_dmr(my_dms.contrasts[0])" ] }, { "cell_type": "code", "execution_count": null, "id": "405f448c-d44f-46da-9e83-850c3c03a85b", "metadata": {}, "outputs": [], "source": [ "# visualize the DMRs for the first contrast\n", "dmr_manhattan_plot(my_dms, 'sample_type[T.PREC]', nb_annotated_probes=20) # by default, plots the DMRs depending on log(p-values)\n", "# dmr_manhattan_plot(my_dms, 'sample_type[T.PREC]', y_col='sample_type[T.PREC]_avg_beta_delta', log10=False, nb_annotated_probes=20) # example to plot DMR depending on the beta values difference between the two groups" ] }, { "cell_type": "markdown", "id": "382314fe-5f84-4eae-b449-a2249f45a3e9", "metadata": {}, "source": [ "## Gene visualization\n", "\n", "We can then have a look at a particular gene identified as differentially methylated, for example CSF1. The heatmap of the beta values of the probes associated to this gene shows a clear methylation difference between the healthy cells (PrEC) and the prostate cancer cells (LNCAP)." ] }, { "cell_type": "code", "execution_count": null, "id": "775af73e-b5f5-4950-91a3-3f5378aa78a7", "metadata": {}, "outputs": [], "source": [ "show_chromosome_legend() # display the legend for chromosome regions colors, corresponding to Giemsa staining\n", "visualize_gene(my_samples, 'CSF1', figsize=(10, 5))" ] } ], "metadata": { "kernelspec": { "display_name": "pylluminator", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }